Finding a unifying motif of intermolecular cooperativity in protein associations 
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At the molecular level, most biological processes entail protein associations which in turn rely on 
a small fraction of interfacial residues called hot spots. Here we show that hot spots share a unifying 
molecular attribute: they provide a third-body contribution to intermolecular cooperativity. Such 
motif, based on the wrapping of interfacial electrostatic interactions, is essential to maintain the 
integrity of the interface and can be exploited in rational drug design since such regions may serve 
as blueprints to engineer small molecules disruptive of protein-protein interfaces. 



PACS numbers: 87.15.km, 87.15.kr, 87.15.K- 

Protein associations are basic molecular processes in 
biology [TMl3|. In spite of their importance, their bio- 
physical underpinnings remain a subject of debate [TUTS]. 
A challenging standing problem involves the characteri- 
zation of hot spots [THU] • These are few in number and 
provide the most significant contribution to the stability 
of the protein-protein interface. Knowledge-based and 
first-principle docking potentials have been relatively suc- 
cessful at predicting these singular sites [THE] , fitting the 
outcome of probes for experimental identification such 
as site-directed mutation or alanine-scanning lj. These 
techniques assess the impact on binding free energy of the 
truncation of an individual residue side chain at the /3- 
carbon. Notwithstanding these predictive successes, the 
physical nature of hot spots remains elusive. Even the 
establishment of general rules for hot-spot characteriza- 
tion has proven unfeasible so far, as has been explicitly 
recognized [JJ IH [5] and constitutes the scope of this let- 
ter. Attempts at rationalizing the stability of protein- 
protein interfaces based on pairwise interactions between 
the two chains is inconclusive at best, as demonstrated in 
this letter. This leads us to focus our attention on higher 
order energetic contributions as a theoretical framework 
to explain and predict binding hot spots. Given the rel- 
ative abundance of hydrophilic residues on the protein 
surface, protein associations are always confronted with 
the disruptive effect of polar hydration [HJ [T5]- Thus, 
the integrity of the protein-protein interface becomes ex- 
tremely reliant on intermolecular cooperativity [141 115) . 
We make this concept precise by invoking three-body cor- 
relations, whereupon a third nonpolar body protects an 
electrostatic interaction pairing the other two by con- 
tributing to the exclusion of surrounding water. Since 
these three-body correlations must engage the two pro- 
tein molecules, the correlations must be subject to an ad- 
ditional constraint: One body belongs to a protein chain 
and the other two to its binding partner. To complete 
this description it is necessary to classify pairwise electro- 
static interactions in terms of an abundance distribution 



P(p), where p is the number of three-body correlations 
associated with an interaction. This distribution is de- 
fined by its mean value (p) = ^2, p [p- P(p)] and dispersion 
a = ({(p — (p)) 2 )) 1 / 2 , which leads us to single out an 
underprotected interaction (UPI) as one in the tail of 
the distribution, that is, with p < (p) — a. The UPIs 
are crucial in defining protein associations due to their 
sensitivity to critical changes in intermolecular coopera- 
tivity brought about by site-directed aminoacid substi- 
tution. As demonstrated previously [T1HT7] . UPIs are 
also adhesive, hence promoters of protein association be- 
cause their inherent stability increases upon approach of 
a third-body nonpolar group that enhances its dehydra- 
tion, and de-screens the partial charges. This physical 
picture leads us to assert that intermolecular cooperativ- 
ity will be most sensitive to site-directed mutation in two 
particular instances: a) When a site mutation changes 
the wrapping value p of an intermolecular or intramolec- 
ular interaction decreasing it to a value below the mean 
p; b) When in a free protein subunit the alanine sub- 
stitution raises the wrapping of a UPI and, additionally, 
this interaction is intermolecularly wrapped in the com- 
plex. This analysis leads us to characterize hot spots as 
the residues whose alanine substitution most drastically 
affects intermolecular cooperativity. This conjecture is 
validated in this work by combinatorially dissecting the 
protein-protein interfaces of structurally reported com- 
plexes that have been independently studied by alanine 
scanning through experimental means. The analysis boils 
down to a decomposition of the interface into a web of 
three-body cooperative interactions. Besides its scien- 
tific interest, the knowledge gained from our approach 
may significantly impact drug discovery endeavors [18] . 
especially since hot spots are expected to constitute the 
blueprint for the design of small molecule drugs disrup- 
tive of protein-protein associations. 

UPIs that involve hydrogen bonds (HBs) are named 
dehydrons. This structural motif has been extensively 
discussed in the literature and identified in soluble 
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proteins with PDB-reported structure [T5HT7] . Thus, 
the extent of hydrogen-bond protection can be deter- 
mined directly from atomic coordinates. This parameter 
indicates the number of three-body correlations engaging 
the HB and is also known as the wrapping of the bond 
and denoted p. It is given by the number of side-chain 
carbonaceous nonpolar groups (CH„, n = 0, 1, 2, 3, 
where the carbon atom of these groups is not bonded 
to an electrophilic atom or polarized group) contained 
within a desolvation domain around the HB. Each wrap- 
ping nonpolar group represents the third body within a 
three-body correlation involving the HB. This domain 
is typically defined as the reunion of two intersecting 
spheres of fixed radius ( thickness of three water layers) 
centered at the a— carbons of the residues paired by the 
hydrogen bond. In structures of PDB-reported soluble 
proteins, backbone hydrogen bonds (BHB) are protected 
on average by p = 26.6 ± 7.5 side-chain nonpolar groups 
for a desolvation sphere of radius 6 A [IB] . Thus, struc- 
tural deficiencies lie in the tail of the p— distribution, i.e. 
their microenvironment contains 19 or fewer nonpolar 
groups, so their p— value is below the mean (=26.6) mi- 
nus one standard deviation (=7.5). While the statistics 
on p— values for backbone hydrogen bonds vary with the 
radius, the tails of the distribution remain invariant, thus 
enabling a robust identification of structural deficiencies 
[14H17j . In the present work we are dealing with protein 
complexes and accordingly we compute the p— values 
arising from intra and inter-molecular correlations. 
Additionally, we consider both intramolecular and 
(less frequent) intermolecular BHBs. The algorithm 
to identify dehydrons, named "Dehydron Calculator", 
is freely accessible from the Web at the following location: 
http://www. owlnet .rice.edu/~arifer/courses/DehydronC 
The wrapping concept may be spatially represented as 
shown in Fig. [T] where two different types of three-body 
correlations are illustrated. Figure [T^,) shows an instance 
of intermolecular wrapping of an intramolecular HB, 
while Fig. [TJd) shows the wrapping of an intermolecular 
HB. 

Our virtual alanine-scanning procedure is performed 
by computationally replacing each residue of a protein 
chain (one at a time) with alanine within the 3D structure 
of the complex and assessing the impact of the substitu- 
tion on intermolecular cooperativity. For most residues 
(those with a side chain larger than that of alanine) this 
means truncating the residue side chain at the (3— carbon 
so that the whole side chain is replaced by a methyl 
group, thus significantly reducing the extent of wrap- 
ping involving the residue. In the special case of glycine 
(which lacks a j3— carbon) we include a methyl at the 
corresponding position, increasing the extent of wrapping 
enabled by the residue. The in silico scanning process en- 
tails computing the change in p— value generated by each 
Ala-substitution on each intra and intermolecular BHBs 
of the complex. In a first stage, we calculate the p value 




FIG. 1. Illustration of intermolecular cooperativity repre- 
sented by three-body correlations: a) Trp 169 (full atomic 
detail) of hGHbp (red chain) wrapping three intramolecular 
BHBs of the hGH chain (blue chain). The BHBs of the hGH 
chain are indicated by white sticks between the corresponding 
a— carbons; b) Similar to a) but for the complex between the 
HIV glycoprotein gpl20 and the CD4 receptor. Here a Trp 
residue of the CD4 chain wraps an intermolecular BHB. 



for all BHBs from the complex structure, producing a set 
of wild-type p— values. For each mutated residue we per- 
form the corresponding ala-substitution leaving all other 
coordinates unchanged and we recalculate the full set 
of p— values (mutated p— values). Then, in accord with 
our premise of intermolecular cooperativity, hot spots are 
predicted taking into account their role as intermolecular 
wrappers according to the following classes: a) The Ala- 
tlttaEifSe on of a residue on one chain lowers the p— value 
of a BHB (an intramolecular BHB in the partner protein 
or an intermolecular BHB) and the mutated p— value of 
this BHB falls below (p). These predicted hot spots will 
be labeled class A hot spots. In the cases where the final 
p— value falls below the dehydron threshold, p = 19, (de- 
hydron creation) these A-class hot spots will be labeled 
A*; b) Alanine substitution increases the wrapping capa- 
bility of a non-wrapper residue (glycine, serine, cysteine, 
aspartic acid or asparagine) located within the desolva- 
tion environment of a BHB of its own protein chain. In 
addition, the intramolecular wrapping value of the BHB 
is p < 19 and this BHB is intermolecularly wrapped 
within the complex. These alanine substitutions raise 
the intramolecular p value in Ap = +1. These alter- 
ations lower the need for intermolecular wrapping upon 
association and the resulting predictions will be labeled 
class B hot spots. In the cases when the intramolecular 
wrapping value of the BHB is exactly p — 19, we will 
denote these B*-class hot spots. This sub-class implies 
that the ala-substitution is indicative of a net intramolec- 
ular removal of a dehydron. We decided to leave aside 
side chain - side chain hydrogen bonds from the coop- 



3 



erativity analysis based on the following grounds: The 
fluctuational nature of surface side chains imposes an en- 
tropic cost associated with HB formation which makes 
the latter marginally stable at best [T3]. Also, the wrap- 
ping statistics for side chain HBs are essentially flat with 
no clear distinction of the tails of the distribution do to 
the conformational richness of the side chains. An a pos- 
teriori justification for the exclusion arises from the very 
artifactual nature of surface side-chain HBs. Particularly 
misleading are the large B-factors of solvent-exposed side 
chains and the large hydration demands of exposed po- 
lar groups, which hinder HB formation. These artifacts 
would yield an overwhelming number of false positives 
in the cooperativity analysis of the protein-protein in- 
terface (most interfacial residues would be hot spots). 
In turn, we shall not take into account salt bridges in 
our analysis, since they are not expected to significantly 
stabilize protein structure. These bridges are destabi- 
lizing with respect to hydrophobic replacement of both 
charged partners and charge burial has been shown to be 
usually destabilizing ([T^ and references therein). How- 
ever, it is also known that for a pair of complimentary 
buried charges it is preferable for them to be paired by 
a salt bridge than to be buried isolated from each other 
[19] , Thus, an Ala-mutation of a residue engaged in a 
salt bridge with its complex partner protein would be 
destabilizing. This trivial type of hot spots accounts for 
approximately 15 % of all the hot spots in the complexes 
considered and obviously lies outside the scope of our 
cooperativity-based analysis. 

We performed a cooperativity-based alanine scanning 
analysis on several protein-protein interfaces from 
complexes with PDB reported structure for which 
experimental alanine scanning results are available 
|3] (in each case, the first protein of the complex 
indicated is the one mutated and we provide the PDB 
entry of the complex and reference of the experimental 
alanine scanning results): Human growth hormone 
receptor/Human growth hormone [T] 
inhibitor/Beta-Trvpsinpn] (2PTC), 
(1YCR), CD4/GP120[22] (1GC1), 
hibitor/Ribonuclease A[23] (1DFJ), 
munity protein/Colicin E9 DNase domain [2"4"] (1BXI), 
Barnase/Barstar|25j (1BRS), Barstar/Barnase[2"5] 
(1BRS), Ribonuclease inhibitor/ Angiogenin [53] (14Y). 

Figure [2] displays our predictions. The experimental 
alanine substitution of a native protein subunit yields 
a change in its binding free energy (AG) which is de- 
noted by A AG = AG mut — AG wt , (mut=mutated, 
wt=wild type) and is indicated with a color scale. The 
cooperativity-based hot-spot predictions of our method 
are indicated with gray squares below the corresponding 
residues and are denoted by A, A*, B and B*.The letter 
"S" labels trival salt bridge hot spots which are removed 
from the list of experimental hot spots used for the com- 
parison with our computational method. 
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FIG. 2. Experimental alanine scaning probes contrasted 
against cooperativity-based in silico scanning for the com- 
plexes indicated. For each case we display the portion of the 
protein chain or the set of residues with experimental data. 
The colors indicate the experimentally determined AAG val- 
ues for the corresponding hot spots, as shown in the scale 
at the right. The gray squares indicate our computational 
predictions, and the letter code is explained in the text. 



TABLE I. Predictions obtained for the different protein com- 
plexes studied (see Table 



Experimental hot spots 
(AAG value) 


Prediction success (percentages) 


A+A*+B+B* 


A+A* 


A*+B* 


> 4 kcal/mol 


89 


61 


56 


> 3 kcal/mol 


83 


58 


50 


> 2 kcal/mol 


79 


54 


46 


> 1 kcal/mol 


74 


53 


37 



To quantify the predicting ability of our method, in 
Table [I] we show our global predictions over the whole 
set of protein complexes studied. 

This comparison between theory and experiment re- 
veals that our computational procedure locates most of 
the experimental alanine-scanning hot spots, with op- 
timal performance (89 % prediction success) for the 
most significant contributors determined experimentally 
(AAG > 4 kcal/mol). The greatest contribution to such 
percentage, 61 %, corresponds to class A mutations (A 
and A*), while class B (B and B*) provides the remaining 
28 %. The last column of the table indicates the predic- 
tions when considering only dehydron creation, A*, and 
dehydron removal, B*. In consonance with our cooper- 
ativity premises, these cases are expected to constitute 
very important mutations and this is in fact the case, 
since such mutations account for 56 % of the highly en- 
ergetic mutations determined experimentally (AAG > 4 
kcal/mol). Additionally, the wild- type p— values aver- 
aged over the residues wrapped in class A hot spots 
yields p — 20.3, a value higher than the dehydron thresh- 
old (p = 19). However, when we average the mutated 
p— values we get a final p — 18, that is, below the dehy- 
dron threshold. Thus, the dehydron threshold is in fact 
statistically framed by the averaged wild-type and mu- 
tated p— values for A-class hot spots, thus revealing the 
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relevance of the qualitative wrapping differences for pro- 
tein affinity. At this point it is worth recalling that our 
method disregards two-body terms unless they are en- 
gaged in a three-body correlation. This approach seems 
natural in view of the fact that no protein-protein inter- 
face has proven trivial at the conventional pairwise level 
analysis [TWr2"] and given the absence of clear rules for 
hot-spot prediction p~HT2] . This last point also makes 
difficult to establish a control for our results, but we 
have nonetheless defined an elementary one based on po- 
lar and hydrophobic complementarities. To this end, we 
have simply characterized residues as hydrophobic (non- 
polar aromatic or aliphatic side chains) or polar (polar 
or charged side chains) and built a contact matrix for 
the complex interface. For each residue we calculated 
the minimum distance between its a— carbon and the 
a— carbons of the residues of the partner protein and 
between the centroid of its side chain and those of the 
partner side chains. An intermolccular contact was con- 
sidered to occur when this minimum distance was below 
6 A (the results are robust to moderate changes in the 
contact parameter and fit a criteria previously adopted 
for protein-protein interfaces^). We applied this analy- 
sis to the hGH/hGHbp complex interface which yielded 
a significant level of mismatches (around 37 %), thus 
indicating that the protein association cannot be sim- 
ply rationalized as a search for pairwise polar-polar and 
hydrophobic-hydrophobic complementarity. More inter- 
estingly, when we restrict the analysis to the experimen- 
tally determined hot spots, the percentage of mismatches 
is slightly higher (42 %). And if we look at the two most 
important hotspots (Trp 104 and Trp 169), these residues 
are involved in 8 mismatches and only 1 hydrophobic- 
hydrophobic contact. This level of mismatching seems 
unavoidable given the high polar content at the protein 
surface which becomes buried upon creation of the com- 
plex. However, when we focus on three-body interac- 
tions, we discover that many hydrophobic residues at 
the complex interface approach polar residues in order 
to wrap BHBs in which the latter are involved. 

To summarize, this letter has shown that protein- 
protein interfaces elude standard physico-chemical anal- 
ysis. Their rationalization in terms of pairwise comple- 
mentarity along the contact region is unsatisfactory, es- 
pecially when it comes to understand the role of hot 
spots as determinants of protein associations. Against 
this reality, this work unravels a seemingly overlooked 
simple molecular motif that proves to be ubiquitous in 
determining protein-protein associations. This motif is 
an indicator of three-body intermolecular cooperativity. 
In essence, such effects arise as a group in one protein 
chain stabilizes (wraps) a preformed hydrogen bond in 
the partner chain or an inter-chain hydrogen bond, so 
that three bodies intervene in the interaction and not all 
three belong to the same chain. We have shown that 
hot-spot predictions based solely on this molecular at- 



tribute and defined by two pure combinatorial rules based 
on structural analysis of protein complexes, account for 
most (89 %) of the hot spots experimentally determined 
by alanine-scanning in a set of protein complexes. Thus, 
the simplicity of our method contrasts with the complex- 
ity of approaches based on full fledged potentials with 
explicit water (where many-body terms are subsumed in 
all- atom interactions). We do not deny the relevance 
of these predictive methods, but such avenues have not 
proven enlightening in terms of identifying clear molec- 
ular promoters of protein associations. By contrast, the 
results presented in this work fulfill such imperative and 
might be instrumental in the design of small molecules 
aimed at disrupting protein-protein interfaces by fulfill- 
ing the wrapping capabilities of hot spots. We believe 
that our combinatorial identification of a molecular pro- 
moter of protein associations holds promise as a guidance 
to rational drug design. 
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